Merging satellite products and ground-based measurements is often required for obtaining precipitation datasets that simultaneously cover large regions with high density and are more accurate than pure satellite precipitation products. Machine and statistical learning regression algorithms are regularly utilized in this endeavour. At the same time, tree-based ensemble algorithms for regression are adopted in various fields for solving algorithmic problems with high accuracy and low computational cost. The latter can constitute a crucial factor for selecting algorithms for satellite precipitation product correction at the daily and finer time scales, where the size of the datasets is particularly large. Still, information on which tree-based ensemble algorithm to select in such a case for the contiguous United States (US) is missing from the literature. In this work, we conduct an extensive comparison between three tree-based ensemble algorithms, specifically random forests, gradient boosting machines (gbm) and extreme gradient boosting (XGBoost), in the context of interest. We use daily data from the PERSIANN (Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks) and the IMERG (Integrated Multi-satellitE Retrievals for GPM) gridded datasets. We also use earth-observed precipitation data from the Global Historical Climatology Network daily (GHCNd) database. The experiments refer to the entire contiguous US and additionally include the application of the linear regression algorithm for benchmarking purposes. The results suggest that XGBoost is the best-performing tree-based ensemble algorithm among those compared. They also suggest that IMERG is more useful than PERSIANN in the context investigated.
translated by 谷歌翻译
机器学习模型的预测和预测应采用概率分布的形式,旨在增加传达给最终用户的信息的数量。尽管在学术界和行业中对机器学习模型的概率预测和预测的应用变得越来越频繁,但在整个领域的整体视野下,相关的概念和方法尚未正式化和结构化。在这里,我们通过机器学习算法以及相关的指标(一致的评分功能和适当的评分规则)回顾了预测不确定性估计的主题,以评估概率预测。该评论涵盖了从引入早期统计(基于贝叶斯统计或分位数回归)的早期统计(线性回归和时间序列模型)到最近的机器学习算法(包括位置,比例和形状的广义添加剂模型,随机森林,增强的概括模型)的时间段。和深度学习算法)本质上更灵活。对该领域进度的审查,加快了我们对如何开发针对用户需求量身定制的新算法的理解,因为最新进步是基于应用于更复杂算法的一些基本概念。我们通过对材料进行分类并讨论成为研究热门话题的挑战来结束。
translated by 谷歌翻译
如今,包括水文学在内的各种应用领域,概率的预测正在受到越来越多的关注。几种机器学习概念和方法与通过应对相关挑战的形式化和优化概率预测实现相关。尽管如此,目前,概率的水文预测文献中缺少着重于此类概念和方法的实际意义评论。尽管在同一文献中从机器学习中受益的研究工作中有明显的加剧,但这种缺席仍然存在,尽管最近出现了实质性的相关进展,尤其是在概率水文后处理领域,传统上为水文学家提供了概率学家的概率学家。水文预测实施。在此,我们旨在填补这一特定空白。在我们的综述中,我们强调了可以导致研究概念和方法有效普及的关键思想和信息,因为这种强调可以支持该领域的成功实施和进一步的科学发展。在相同的前瞻性方向上,我们确定了开放的研究问题,并提出了将来要探索的想法。
translated by 谷歌翻译
基于回归的水流区域化框架是围绕流域水文学,洪水频率分析及其相互作用的流域属性构建的。在这项工作中,我们通过制定并广泛研究了基于回归的水流区域化框架,从而偏离了这一传统路径,这些框架主要来自数据科学的通用时间序列序列特征,更准确地说,从多种此类功能中出现。我们专注于28个功能,包括(部分)自相关,熵,时间变化,季节性,趋势,肿块,稳定性,非线性,线性,尖峰,曲率等。我们估计了511个流域的每日温度,降水和水流时间序列的这些特征,然后将它们与传统的地形,土地覆盖,土壤和地质属性合并。降水量和温度特征(例如,降水时间序列的光谱熵,季节性强度和滞后1的自相关以及温度时间序列的稳定性和趋势强度)是许多流量特征的有用预测指标。这也适用于传统属性,例如流域平均高程。还揭示了预测变量和因变量之间的关系,而光谱熵,季节性强度和流量时间序列的几个自相关特征被发现比其他元素更具区域化。
translated by 谷歌翻译
全面了解各种地球物理流程的行为,其中包括跨颞级的详细调查。在这项工作中,我们提出了一个新的时间序列,以推进和丰富循环背景下的这种调查。该特定编译可以促进在时间依赖,时间变化,“预测性”,集成,稳定性,非线性(和线性度),趋势,尖峰,曲率和季节性方面的主要可解释的特征调查和比较。通过计算九个时间分辨率的提出的特征编译的值(即,1天,2天,3天,7天,0.5个月,1个月,2 -Month,3个月和6个月的6个月)和三种循环时间序列类型(即温度,降水和流出),适用于美国大陆大陆511个地理位置的34年长时间系列记录。基于所获取的信息和知识,识别出具有增加(或减少)时间分辨率的表征其特征值的演进模式的所获取的时间序列类型之间的相似性和差异。我们认为,这些模式的相似之处相当令人惊讶。我们还发现,从基于特征的时间序列聚类出现的空间模式在很大程度上类似于时间尺度,并比较了在各个时间分辨率下聚类时间序列的实用性的特征。对于大多数特征,这种有用性可以在时间分辨率和时间序列类型上变化,从而指出需要对循环相似性研究进行多方面的时间序列表征。
translated by 谷歌翻译
Many, if not most, systems of interest in science are naturally described as nonlinear dynamical systems (DS). Empirically, we commonly access these systems through time series measurements, where often we have time series from different types of data modalities simultaneously. For instance, we may have event counts in addition to some continuous signal. While by now there are many powerful machine learning (ML) tools for integrating different data modalities into predictive models, this has rarely been approached so far from the perspective of uncovering the underlying, data-generating DS (aka DS reconstruction). Recently, sparse teacher forcing (TF) has been suggested as an efficient control-theoretic method for dealing with exploding loss gradients when training ML models on chaotic DS. Here we incorporate this idea into a novel recurrent neural network (RNN) training framework for DS reconstruction based on multimodal variational autoencoders (MVAE). The forcing signal for the RNN is generated by the MVAE which integrates different types of simultaneously given time series data into a joint latent code optimal for DS reconstruction. We show that this training method achieves significantly better reconstructions on multimodal datasets generated from chaotic DS benchmarks than various alternative methods.
translated by 谷歌翻译
Quantifying motion in 3D is important for studying the behavior of humans and other animals, but manual pose annotations are expensive and time-consuming to obtain. Self-supervised keypoint discovery is a promising strategy for estimating 3D poses without annotations. However, current keypoint discovery approaches commonly process single 2D views and do not operate in the 3D space. We propose a new method to perform self-supervised keypoint discovery in 3D from multi-view videos of behaving agents, without any keypoint or bounding box supervision in 2D or 3D. Our method uses an encoder-decoder architecture with a 3D volumetric heatmap, trained to reconstruct spatiotemporal differences across multiple views, in addition to joint length constraints on a learned 3D skeleton of the subject. In this way, we discover keypoints without requiring manual supervision in videos of humans and rats, demonstrating the potential of 3D keypoint discovery for studying behavior.
translated by 谷歌翻译
We present hierarchical policy blending as optimal transport (HiPBOT). This hierarchical framework adapts the weights of low-level reactive expert policies, adding a look-ahead planning layer on the parameter space of a product of expert policies and agents. Our high-level planner realizes a policy blending via unbalanced optimal transport, consolidating the scaling of underlying Riemannian motion policies, effectively adjusting their Riemannian matrix, and deciding over the priorities between experts and agents, guaranteeing safety and task success. Our experimental results in a range of application scenarios from low-dimensional navigation to high-dimensional whole-body control showcase the efficacy and efficiency of HiPBOT, which outperforms state-of-the-art baselines that either perform probabilistic inference or define a tree structure of experts, paving the way for new applications of optimal transport to robot control. More material at https://sites.google.com/view/hipobot
translated by 谷歌翻译
安全是每个机器人平台的关键特性:任何控制政策始终遵守执行器限制,并避免与环境和人类发生冲突。在加强学习中,安全对于探索环境而不会造成任何损害更为基础。尽管有许多针对安全勘探问题的建议解决方案,但只有少数可以处理现实世界的复杂性。本文介绍了一种安全探索的新公式,用于强化各种机器人任务。我们的方法适用于广泛的机器人平台,即使在通过探索约束歧管的切线空间从数据中学到的复杂碰撞约束下也可以执行安全。我们提出的方法在模拟的高维和动态任务中实现了最先进的表现,同时避免与环境发生冲突。我们在Tiago ++机器人上展示了安全的现实部署,在操纵和人类机器人交互任务中取得了显着的性能。
translated by 谷歌翻译
多目标高维运动优化问题在机器人技术中无处不在,并且信息丰富的梯度受益。为此,我们要求所有成本函数都可以微分。我们建议学习任务空间,数据驱动的成本功能作为扩散模型。扩散模型代表表达性的多模式分布,并在整个空间中表现出适当的梯度。我们通过将学习的成本功能与单个目标功能中的其他潜在学到的或手工调整的成本相结合,并通过梯度下降共同优化所有这些属性来优化运动。我们在一组复杂的掌握和运动计划问题中展示了联合优化的好处,并与将掌握的掌握选择与运动优化相提并论相比。
translated by 谷歌翻译